Last updated: 2025-06-26
Checks: 7 0
Knit directory: mapelli_numa/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20250522) was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b4f754b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: data/
Unstaged changes:
Modified: analysis/module_rnaseq_report.Rmd
Modified: analysis/module_rnaseq_report_main.Rmd
Modified: src/__utils_rna_seq_functions.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/02_numa_vs_nonuma_s.Rmd)
and HTML (docs/02_numa_vs_nonuma_s.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | b4f754b | Mariani_Gianluca_Alessio | 2025-06-26 | Added all reports |
| Rmd | 5adc6a6 | Yinxiu Zhan | 2025-06-24 | :sparkles: Add new files |
knitr::opts_chunk$set(echo = FALSE,
message = FALSE,
warning = FALSE,
cache = FALSE,
autodep = TRUE,
fig.align = 'center',
fig.width = 10,
fig.height = 8)
The objective of this report is to investigate differential gene expression between the two conditions and to conduct gene ontology enrichment analysis to explore the biological functions involved.
Below is the list of parameters used in this report to define differential gene expression.
Below we show the comparison group considered for the analysis presented in this report. Each group contains all the samples associated to the specific condition we want to conduct the analysis on.
The group is divided into experimental samples and control samples.
Each differential gene expression comparison will be conducted between these two groups.
Group considered:
- Experimental samples
(Synchronized NuMA-,
e):
S61885_S-NuMA_A
S61886_S-NuMA_B
S61887_S-NuMA_C
-
Control Samples (Synchronized NuMA+,
c/tc):
S61882_S_plus_NuMA_A
S61883_S_plus_NuMA_B
S61884_S_plus_NuMA_C
The RNAseq data for this analysis:
The sample population include:
No technical control samples are present, so no background correction is applied. All analyses are based on normalized counts and a model that considers only the experimental condition. Lowly expressed genes are removed to reduce noise. Lowly expressed genes are here considered as:
Below we present the PCA analysis conducted on the two specific conditions analyzed in this report.
Although the samples do not form perfectly distinct clusters, the main Principal Component clearly separates the experimental and control groups. This supports the conclusion that the samples are valid.
The MA plot is a widely used visualization in differential expression analysis that displays the relationship between the average expression (A) and the log fold change (M) for each gene. The x-axis represents the mean expression level across samples, while the y-axis shows the log fold change between groups.
Total number of significant genes: 266
The Volcano plot is a graphical method to visualize differential expression results by combining statistical significance and magnitude of change for each gene. It plots the log2 fold change on the x-axis against the negative log10 of the p-value (or adjusted p-value) on the y-axis.
Below we present two tables, the first includes all the genes identified in the analysis while the second includes only the differentially expressed genes (DEG)
The columns in the table are:
baseMean: The average normalized count of a gene across all samples, reflecting its overall expression level in the dataset.
log2FoldChange: The estimated log2-transformed fold change in expression between two conditions (e.g., experimental vs control). Positive values indicate upregulation, negative values indicate downregulation.
lfcSE: The standard error associated with the log2 fold change estimate, indicating the variability or uncertainty of the fold change measurement.
stat: The test statistic value calculated for the hypothesis test of whether the log2 fold change differs from zero.
pvalue: The raw p-value corresponding to the statistical test for differential expression; it reflects the probability of observing the data assuming no true difference in expression.
padj: The p-value adjusted for multiple testing (e.g., using the Benjamini-Hochberg method) to control the false discovery rate (FDR), providing a more reliable significance measure.
comparison_exp_vs_contr: A label or descriptor indicating the comparison made, specifying which condition is experimental and which is control.
gene: The unique Ensembl identifier for each gene as annotated in the reference genome.
symbol: The gene symbol or common gene name, which is easier to interpret biologically than numerical IDs.
FoldChange: The fold change in linear scale (non-logarithmic), derived from log2FoldChange (i.e., 2^(log2FoldChange)), representing how many times expression has changed.
differentially_expressed: A categorical variable indicating whether the gene is considered differentially expressed (e.g., “yes” or “no”) based on the predefined thresholds for significance and fold change described in the next section.
In this table we can find a subset of the previous table that includes the differentially expressed genes (DEGs).
The genes defined as DEGs need to satisfy these two conditions:
Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.
Meaning of Colors
This heatmap presents the expression profiles of the top 20 genes showing the most statistically significant differential expression, ranked by their adjusted p-values (padj). These genes represent the strongest candidates for biologically relevant changes between conditions. Displaying normalized and scaled expression values across samples, this focused visualization highlights the distinct expression patterns of the most significant genes, facilitating interpretation of key transcriptional differences driving the experimental effects.

This heatmap displays the expression levels of all genes detected in the RNA-seq dataset across all samples. The values are normalized and transformed (e.g., via variance stabilizing transformation) to allow comparison across genes and samples. This comprehensive visualization provides an overview of the global expression patterns, highlighting overall similarities and differences between samples, as well as potential outliers.

Further analysis is done through gene set enrichment analysis, which does not exclude genes based on logfc or adjusted p-value, as done previously. GSEA is performed separately on each subontology: biological processes (BP), cellular components (CC) and molecular functions (MF). The dot plot below shows all the enriched GO terms. The size of each dot correlates with the count of differentially expressed genes associated with each GO term. Furthermore, the color of each dot reflects the significance of the enrichment of the respective GO term, highlighting its relative importance.
Synchronized NuMA- vs Synchronized NuMA+
P value cutoff: 0.01

Synchronized NuMA- vs Synchronized NuMA+
P value cutoff: 0.01

Synchronized NuMA- vs Synchronized NuMA+
P value cutoff: 0.01

We performed a functional enrichment analysis based on Over-Representation Analysis (ORA) using the Reactome pathway database. Unlike GSEA, which considers the entire ranked list of genes, ORA focuses only on genes that meet specific differential expression thresholds (e.g., adjusted p-value and log2 fold change). The analysis was conducted separately for upregulated and downregulated genes to identify Reactome pathways that are significantly enriched in each group, compared to what would be expected by chance. This allows for a clearer biological interpretation of distinct transcriptional programs activated or suppressed in the dataset. The dot plots below display all significantly enriched Reactome pathways. Each dot’s size represents the number of differentially expressed genes associated with the pathway, while the color reflects the statistical significance of the enrichment (adjusted p-value).
Synchronized NuMA- vs Synchronized
NuMA+
Synchronized NuMA- vs Synchronized
NuMA+
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] ReactomePA_1.53.0 CorLevelPlot_0.99.0
[3] tibble_3.3.0 limma_3.65.1
[5] org.Hs.eg.db_3.21.0 AnnotationDbi_1.71.0
[7] git2r_0.36.2 gridExtra_2.3
[9] WGCNA_1.73 fastcluster_1.3.0
[11] dynamicTreeCut_1.63-1 dplyr_1.1.4
[13] clusterProfiler_4.17.0 reshape_0.8.9
[15] DT_0.33 gplots_3.2.0
[17] RColorBrewer_1.1-3 rtracklayer_1.69.0
[19] DESeq2_1.49.1 SummarizedExperiment_1.39.0
[21] Biobase_2.69.0 MatrixGenerics_1.21.0
[23] matrixStats_1.5.0 GenomicRanges_1.61.0
[25] GenomeInfoDb_1.45.4 IRanges_2.43.0
[27] S4Vectors_0.47.0 BiocGenerics_0.55.0
[29] generics_0.1.4 ComplexHeatmap_2.25.0
[31] plotly_4.10.4 ggplot2_3.5.2
[33] yaml_2.3.10 workflowr_1.7.1
loaded via a namespace (and not attached):
[1] splines_4.5.0 later_1.4.2 BiocIO_1.19.0
[4] bitops_1.0-9 ggplotify_0.1.2 R.oo_1.27.1
[7] polyclip_1.10-7 preprocessCore_1.71.0 graph_1.87.0
[10] rpart_4.1.24 XML_3.99-0.18 lifecycle_1.0.4
[13] doParallel_1.0.17 rprojroot_2.0.4 MASS_7.3-65
[16] processx_3.8.6 lattice_0.22-7 crosstalk_1.2.1
[19] backports_1.5.0 magrittr_2.0.3 Hmisc_5.2-3
[22] sass_0.4.10 rmarkdown_2.29 jquerylib_0.1.4
[25] httpuv_1.6.16 ggtangle_0.0.6 cowplot_1.1.3
[28] DBI_1.2.3 abind_1.4-8 purrr_1.0.4
[31] R.utils_2.13.0 ggraph_2.2.1 RCurl_1.98-1.17
[34] nnet_7.3-20 yulab.utils_0.2.0 rappdirs_0.3.3
[37] tweenr_2.0.3 circlize_0.4.16 enrichplot_1.29.1
[40] ggrepel_0.9.6 tidytree_0.4.6 reactome.db_1.92.0
[43] codetools_0.2-20 DelayedArray_0.35.1 ggforce_0.5.0
[46] DOSE_4.3.0 tidyselect_1.2.1 shape_1.4.6.1
[49] aplot_0.2.5 UCSC.utils_1.5.0 farver_2.1.2
[52] viridis_0.6.5 base64enc_0.1-3 GenomicAlignments_1.45.0
[55] jsonlite_2.0.0 GetoptLong_1.0.5 tidygraph_1.3.1
[58] Formula_1.2-5 survival_3.8-3 iterators_1.0.14
[61] foreach_1.5.2 tools_4.5.0 treeio_1.33.0
[64] Rcpp_1.0.14 glue_1.8.0 SparseArray_1.9.0
[67] xfun_0.52 qvalue_2.41.0 withr_3.0.2
[70] fastmap_1.2.0 callr_3.7.6 caTools_1.18.3
[73] digest_0.6.37 R6_2.6.1 gridGraphics_0.5-1
[76] colorspace_2.1-1 GO.db_3.21.0 gtools_3.9.5
[79] RSQLite_2.4.0 R.methodsS3_1.8.2 tidyr_1.3.1
[82] data.table_1.17.6 graphlayouts_1.2.2 httr_1.4.7
[85] htmlwidgets_1.6.4 S4Arrays_1.9.1 graphite_1.55.0
[88] whisker_0.4.1 pkgconfig_2.0.3 gtable_0.3.6
[91] blob_1.2.4 impute_1.83.0 XVector_0.49.0
[94] htmltools_0.5.8.1 fgsea_1.35.0 clue_0.3-66
[97] scales_1.4.0 png_0.1-8 ggfun_0.1.8
[100] knitr_1.50 rstudioapi_0.17.1 reshape2_1.4.4
[103] rjson_0.2.23 checkmate_2.3.2 nlme_3.1-168
[106] curl_6.4.0 cachem_1.1.0 GlobalOptions_0.1.2
[109] stringr_1.5.1 KernSmooth_2.23-26 parallel_4.5.0
[112] foreign_0.8-90 restfulr_0.0.15 pillar_1.10.2
[115] vctrs_0.6.5 promises_1.3.3 cluster_2.1.8.1
[118] htmlTable_2.4.3 evaluate_1.0.4 cli_3.6.5
[121] locfit_1.5-9.12 compiler_4.5.0 Rsamtools_2.25.0
[124] rlang_1.1.6 crayon_1.5.3 labeling_0.4.3
[127] ps_1.9.1 getPass_0.2-4 plyr_1.8.9
[130] fs_1.6.6 stringi_1.8.7 viridisLite_0.4.2
[133] BiocParallel_1.42.1 Biostrings_2.77.1 lazyeval_0.2.2
[136] GOSemSim_2.35.0 Matrix_1.7-3 patchwork_1.3.0
[139] bit64_4.6.0-1 statmod_1.5.0 KEGGREST_1.49.0
[142] igraph_2.1.4 memoise_2.0.1 bslib_0.9.0
[145] ggtree_3.17.0 fastmatch_1.1-6 bit_4.6.0
[148] ape_5.8-1 gson_0.1.0